Rank | Count | Beginning |
---|---|---|
6785 | 439 | या |
9673 | 191 | हे |
9350 | 143 | हा |
3391 | 110 | ते |
9546 | 110 | ही |
1264 | 93 | इ.स. |
3616 | 85 | त्या |
4288 | 85 | त्यांनी |
3918 | 78 | त्यामुळे |
9899 | 78 | ह्या |
3177 | 74 | तसेच |
4922 | 71 | पण |
3811 | 68 | त्यानंतर |
4126 | 67 | त्यांचे |
4194 | 47 | त्यांच्या |
7624 | 44 | येथे |
5265 | 41 | पुढे |
6030 | 38 | भारतीय |
8800 | 38 | सर्वात |
5048 | 37 | परंतु |
3562 | 36 | तो |
6962 | 32 | याचे |
759 | 30 | अशा |
1947 | 29 | काही |
6384 | 29 | मात्र |
7237 | 29 | यामध्ये |
825 | 28 | असे |
7590 | 28 | येथील |
1072 | 26 | आपल्या |
4086 | 26 | त्यांचा |
In the next four subsections show the most frequent sentence beginnings consisting of N words, N=1, 2, 3, 4. In this subsection we start with N=1.
The most frequent word-N-grams at the beginning of sentences give some insight into sentence composition.
Especially for N=1, we only need a small corpus to identify the most frequent sentence beginnings.
select substring_index(sentence, ' ', 1) as beg, count(*) as cnt from sentences group by substring_index(sentence, ' ', 1) order by cnt desc limit 50;
4.3.1.2 Most Frequent Sentence Beginnings II
4.3.1.3 Most Frequent Sentence Beginnings III
4.3.1.4 Most Frequent Sentence Beginnings IV
4.3.1.1 Most Frequent Sentence Endings I
4.3.1.2 Most Frequent Sentence Endings II
4.3.1.3 Most Frequent Sentence Endings III
4.3.1.4 Most Frequent Sentence Endings IV